{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# DEBUG: Inference performance comparison                                                                                                             \n",
    "#  TFLite - Performance CPU only vs. Heterogenous execution\n",
    "In this example notebook, we compare ***TFlite*** inference performance, of a pre-trained Classification model running on CPU (Cortex A**) vs. the same model running in an heterogenous approach (Cortex A** + TIDL offload)\n",
    "\n",
    "   - The user can choose the model (see section titled *Choosing a Pre-Compiled Model*)\n",
    "   - The models used in this example were trained on the ***ImageNet*** dataset because it is a widely used dataset developed for training and benchmarking image classification AI models. \n",
    "   - We perform inference on one sample image.\n",
    "   \n",
    "## Choosing a Pre-Compiled Model\n",
    "We provide a set of precompiled artifacts to use with this notebook that will appear as a drop-down list once the first code cell is executed.\n",
    "\n",
    "<img src=docs/images/drop_down.PNG width=\"400\">\n",
    "\n",
    "**Note**:Users can run this notebook as-is, only action required is to select a model. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import cv2\n",
    "import numpy as np\n",
    "import ipywidgets as widgets\n",
    "from scripts.utils import get_eval_configs\n",
    "last_artifacts_id = selected_model_id.value if \"selected_model_id\" in locals() else None\n",
    "prebuilt_configs, selected_model_id = get_eval_configs('classification','tflitert', num_quant_bits = 8, last_artifacts_id = last_artifacts_id)\n",
    "display(selected_model_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f'Selected Model: {selected_model_id.label}')\n",
    "config = prebuilt_configs[selected_model_id.value]\n",
    "config['session'].set_param('model_id', selected_model_id.value)\n",
    "config['session'].start()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define utility function to preprocess input images\n",
    "\n",
    "Below, we define a utility function to preprocess images for the model. This function takes a path as input, loads the image and preprocesses the images as required by the model. The steps below are shown as a reference (no user action required):\n",
    "\n",
    " 1. Load image\n",
    " 2. Convert BGR image to RGB\n",
    " 3. Scale image\n",
    " 4. Apply per-channel pixel scaling and mean subtraction\n",
    " 5. Convert RGB Image to BGR. \n",
    " 6. Convert the image to NCHW format\n",
    "\n",
    "\n",
    "- The input arguments of this utility function is selected automatically by this notebook based on the model selected in the drop-down"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess(image_path, size, mean, scale, layout, reverse_channels):\n",
    "    # Step 1\n",
    "    img = cv2.imread(image_path)\n",
    "    \n",
    "    # Step 2\n",
    "    img = img[:,:,::-1]\n",
    "    \n",
    "    # Step 3\n",
    "    img = cv2.resize(img, (size, size), interpolation=cv2.INTER_CUBIC)\n",
    "     \n",
    "    # Step 4\n",
    "    if mean is not None and scale is not None:\n",
    "        img = img.astype('float32')        \n",
    "        for mean, scale, ch in zip(mean, scale, range(img.shape[2])):\n",
    "            img[:,:,ch] = ((img.astype('float32')[:,:,ch] - mean) * scale)\n",
    "    # Step 5\n",
    "    if reverse_channels:\n",
    "        img = img[:,:,::-1]\n",
    "        \n",
    "    # Step 6\n",
    "    if layout == 'NCHW':\n",
    "        img = np.expand_dims(np.transpose(img, (2,0,1)),axis=0)\n",
    "    else:\n",
    "        img = np.expand_dims(img,axis=0)\n",
    "    \n",
    "    return img"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts.utils import get_preproc_props\n",
    "\n",
    "size, mean, scale, layout, reverse_channels = get_preproc_props(config)    \n",
    "print(f'Image size: {size}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load and Run a model on ARM (Cortex A**) only\n",
    "\n",
    "Next cell executes ***TF Lite*** model on Cortex A, and collect benchmark data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tflite_runtime.interpreter as tflite\n",
    "import matplotlib.pyplot as plt\n",
    "from scripts.utils import imagenet_class_to_name\n",
    "\n",
    "tflite_model_path = config['session'].get_param('model_file')\n",
    "artifacts_dir = config['session'].get_param('artifacts_folder')\n",
    "\n",
    "interpreter = tflite.Interpreter(model_path=tflite_model_path)\n",
    "interpreter.allocate_tensors()\n",
    "\n",
    "input_details = interpreter.get_input_details()\n",
    "output_details = interpreter.get_output_details()\n",
    "\n",
    "img_in = preprocess('sample-images/elephant.bmp' , size, mean, scale, layout, reverse_channels) \n",
    "\n",
    "if not input_details[0]['dtype'] == np.float32:\n",
    "    img_in = np.uint8(img_in)\n",
    "    \n",
    "interpreter.set_tensor(input_details[0]['index'], img_in)\n",
    "\n",
    "#Running inference several times to get an stable performance output\n",
    "for i in range(5):\n",
    "    interpreter.invoke()\n",
    "    \n",
    "res = interpreter.get_tensor(output_details[0]['index'])\n",
    "\n",
    "print(f'\\nTop three results:')\n",
    "for idx, cls in enumerate(res[0].squeeze()[1:].argsort()[-3:][::-1]):\n",
    "    print('[%d] %s' % (idx, '/'.join(imagenet_class_to_name(cls))))\n",
    "    \n",
    "from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output\n",
    "print(f'\\nPerformance CPU EP')\n",
    "stats = interpreter.get_TI_benchmark_data()\n",
    "fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))\n",
    "plot_TI_performance_data(stats, axis=ax)\n",
    "plt.show()\n",
    "\n",
    "tt, st, rb, wb = get_benchmark_output(stats)\n",
    "print(f'Statistics : \\n Inferences Per Second   : {1000.0/tt :7.2f} fps')\n",
    "print(f' Inferece Time Per Image : {tt :7.2f} ms  \\n DDR BW Per Image        : {rb+ wb : 7.2f} MB')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load and Run a model on Heterogenous mode \n",
    "\n",
    "Next cell executes ***TF Lite*** model in heterogenous mode. Model runs on Cortex A** with graphs offload to TIDL using ***`libtidl_tfl_delegate`*** delegate library. Benchmark data is shown at the end."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "tflite_model_path = config['session'].get_param('model_file')\n",
    "artifacts_dir = config['session'].get_param('artifacts_folder')\n",
    "\n",
    "tidl_delegate = [tflite.load_delegate('libtidl_tfl_delegate.so', {'artifacts_folder': artifacts_dir})]\n",
    "\n",
    "interpreter = tflite.Interpreter(model_path=tflite_model_path, experimental_delegates=tidl_delegate)\n",
    "interpreter.allocate_tensors()\n",
    "\n",
    "input_details = interpreter.get_input_details()\n",
    "output_details = interpreter.get_output_details()\n",
    "\n",
    "img_in = preprocess('sample-images/elephant.bmp' , size, mean, scale, layout, reverse_channels) \n",
    "\n",
    "if not input_details[0]['dtype'] == np.float32:\n",
    "    img_in = np.uint8(img_in)\n",
    "    \n",
    "interpreter.set_tensor(input_details[0]['index'], img_in)\n",
    "\n",
    "#Running inference several times to get an stable performance output\n",
    "for i in range(5):\n",
    "    interpreter.invoke()\n",
    "    \n",
    "res = interpreter.get_tensor(output_details[0]['index'])\n",
    "\n",
    "print(f'\\nTop three results:')\n",
    "for idx, cls in enumerate(res[0].squeeze()[1:].argsort()[-3:][::-1]):\n",
    "    print('[%d] %s' % (idx, '/'.join(imagenet_class_to_name(cls))))\n",
    "    \n",
    "from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output\n",
    "print(f'\\nPerformance TFLite + TIDL delegates')\n",
    "stats = interpreter.get_TI_benchmark_data()\n",
    "fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))\n",
    "plot_TI_performance_data(stats, axis=ax)\n",
    "plt.show()\n",
    "\n",
    "tt, st, rb, wb = get_benchmark_output(stats)\n",
    "print(f'Statistics : \\n Inferences Per Second   : {1000.0/tt :7.2f} fps')\n",
    "print(f' Inferece Time Per Image : {tt :7.2f} ms  \\n DDR BW Per Image        : {rb+ wb : 7.2f} MB')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Final notes\n",
    "\n",
    "- With this notebook, user's can quickly compare FPS when running their models only on Cortex A** vs. running their models in heterogenous mode.\n",
    "- If in Heterogenous mode a model's accuracy, or output, is wrong, a quick sanity check is to run the same model only on Cortex A**\n",
    "- Accuracy can be improved by modifying TIDL compilation options. For additional tips you can check \"run and compare a model compiled with different compilation option\" inside debug_tips notebook"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
